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Abstract — In the paper, we consider delay-optimal charging 
scheduling of the electric vehicles (EVs) at a charging station 
with multiple charge points. The charging station is equipped 
with renewable energy generation devices and can also buy 
energy from power grid. The uncertainty of the EV arrival, 
the intermittence of the renewable energy, and the variation of 
the grid power price are taken into account and described as 
independent Markov processes. Meanwhile, the charging energy 
for each EV is random. The goal is to minimize the mean 
waiting time of EVs under the long term constraint on the 
cost. We propose queue mapping to convert the EV queue to 
the charge demand queue and prove the equivalence between 
the minimization of the two queues' average length. Then we 
focus on the minimization for the average length of the charge 
demand queue under long term cost constraint. We propose a 
framework of Markov decision process (MDP) to investigate this 
scheduling problem. The system state includes the charge demand 
queue length, the charge demand arrival, the energy level in the 
storage battery of the renewable energy, the renewable energy 
arrival, and the grid power price. Additionally the number of 
charging demands and the allocated energy from the storage 
battery compose the two-dimensional policy. We derive two 
necessary conditions of the optimal policy. Moreover, we discuss 
the reduction of the two-dimensional policy to be the number 
of charging demands only. We give the sets of system states 
for which charging no demand and charging as many demands 
as possible are optimal, respectively. Finally we investigate the 
proposed radical policy and conservative policy numerically. 

Index Terms — Electric vehicle, charging scheduling, renewable 
energy, Markov decision process. 

I. Introduction 

As an important method of operation to mitigate the short- 
age of the fossil fuel and severe environmental problems, the 
electric vehicle (EV) technology has attracted much interest 
in recent years. Compared to conventional vehicles, EVs have 
advantages in the following aspects: energy efficiency, eco- 
effect, performance benefits, and energy independence 0X 
However, a fuel driven vehicle can produce less C02 than 
an EV if the charging energy is entirely produced by coal- 
fired power plants 0. Thus, the renewable energy (e.g., solar 
or wind energy [3 |) should be the energy source of the EVs 
fully or at least partially to achieve the real environmental 
advantages. 

Since EVs are propelled by an electric motor (or motors) 
that is powered by rechargeable battery packs, EVs need to 
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be charged periodically. Then the EV charging becomes an 
important topic E), 0. In particular, there are some works 
on the scheduling of EV charging in literature lED- fl20l . 

In (6l], the EV battery charging behavior was optimized with 
the objective to minimize charging costs, achieve satisfactory 
state-of-energy levels, and optimal power balancing. In 0, the 
problem of optimizing plug-in hybrid electric vehicle (PHEV) 
charge trajectory (i.e. timing and rate of the charging) was 
studied to reduce the energy cost and battery degradation. 
For the purpose of improving the satisfiability of EVs, a 
reservation-based scheduling algorithm for the charging station 
to decide the service order of multiple requests was proposed 
in (8). In (9), a joint optimal power flow (OPF)-charging (dy- 
namic) optimization problem was formulated with the goal of 
minimizing the generation and charging costs while satisfying 
the network, physical and inelastic-load constraints. In [[Toll , 
utilizing the particle swarm optimization, a proposed algorithm 
optimally manages a large number of PHEVs charging at 
a municipal parking station. In ifTTl . the minimization of 
the waiting time for EV charging via scheduling charging 
activities spatially and temporally in a large-scale road network 
was investigated. By modeling an EV charging system as 
a cyber-physical system, a decentralised online EV charging 
scheduling scheme was developed in fT2ll . In fT3lh the authors 
formulated the EV charging scheduling problem to fill the 
electric load valley as an optimal control problem, and a 
decentralized algorithm was derived. In [14], a strategy to 
coordinate the charging of plug-in EVs (PEVs) was proposed 
by using the non-cooperative games 021. Flexible charging 
optimization for EVs considering distribution grid constraints, 
both voltage and power, was investigated in lfT6l . In ifTTlL 
the trade off between distribution system load with quality 
of charging service was considered, and the centralized algo- 
rithms to schedule the charging of vehicles were designed. In 
fT8ll and fT9lh real-time scheduling policies of EV charging 
were considered when both the renewable energy and energy 
from the grid are available. In [20], the PEV charging and 
wind power scheduling were integrated, and the synergistic 
control algorithm of plug-on vehicle charging and wind power 
scheduling was proposed. 

In the paper, we focus on the scheduling approach of 
EV charging at a charging station. The charging station has 
multiple charge points and is equipped with renewable energy 
generation devices and storage battery. The charged energy at 
a charge point during a period is constant and is called an 
energy block. We model the arrival of the renewable energy 
as a Markov chain. The charging energy can also be purchased 
from power grid, and the price changes also according to 
another Markov chain. The arrival of the EVs is assumed as a 
Markov process. Once an EV arrives at the charging station, it 
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waits in a queue before charging. In each period, the charging 
station chooses some EVs from the head of the queue for 
charging. Meanwhile, the station also determines how much 
energy is supplied from the storage battery (the rest of the 
required energy is supplied from the power grid). The objective 
is minimizing the mean waiting time of EVs under the long 
term cost constraint. 

Since the amount of charging energy (i.e., the number 
of energy blocks to charge) for each EV is random, the 
scheduling problem is very challenging. We propose queue 
mapping method to solve the difficulty. We map the EV queue 
to a charge demand queue. In the charge demand queue, 
each demand means an energy block that need to charge and 
some consecutive demands correspond to an EVs required 
charing energy. We prove that the minimization of the average 
EV queue length is equivalent to the minimization of the 
average charge demand queue length. Then we focus on the 
charge demand queue minimization under the cost constraint. 
The scheduling problem can be equivalently reconstructed as 
follows. The demand arrives according to a discrete-time batch 
Markovian arrival process (D-BMAP) and waits in the charge 
demand queue before service (charging). In each period, the 
charging station chooses some demands from the head of the 
charge demand queue for charging. Meanwhile, the station 
also determines how much energy is supplied from the storage 
battery (the rest of the required energy is supplied from the 
power grid). The objective is minimizing the mean length of 
the charge demand queue under the long term cost constraint. 

Next, we find that the reconstructed optimization problem 
can be studied under a Markov decision process (MDP) 
framework. The system state contains the charge demand 
queue length, the demand arrival, the energy level in the 
storage battery of the renewable energy, the renewable energy 
arrival, and the grid power price. Meanwhile, the number of 
charging demands and the allocated energy from the storage 
battery constitute the two-dimensional policy. We find that 
the general case of the reconstructed optimization problem 
can be analyzed similarly as the analysis of a special case. 
Then we focus on the analysis of the special case that 
is formulated as a constrained MDP EH. We analyze the 
optimal two-dimensional policy of the constrained MDP by 
transforming to an average cost MDP and its corresponding 
discount cost MDP thereafter. First, the constrained MDP 
is converted to an unconstrained MDP by using Lagrangian 
relaxation. Moreover, we derive that the optimal solution of 
the unconstrained MDP with a certain Lagrangian multiplier 
is the optimal for the original constrained MDP. Next, the 
unconstrained MDP can be analyzed by transforming to its 
corresponding discount cost MDP. We obtain two necessary 
conditions for the optimal solution. Third, we analyze the 
relations between the two elements of the two-dimensional 
policy, and find that the number of charging demand^] is 
dominant. Thus, we propose a conjecture that the constrained 
MDP problem can be reduced to a MDP problem with the 
policy to be the number of charging demands only. We then 
derive the conditions of the system state when the policy that 

1 In the special case, we can use "EV" and "demand" interchangeably. 



| Renewable 
I energy 





Storage 
battery 




► 





I Energy from the 
I power grid 




A charging station with M charge 
points (charge capacity =M) 



Fig. 1. System model 



charging no demand is optimal. We also obtain the system 
state conditions when charging as many demands as possible 
is optimal. 

The rest of the paper is structured as follows. In Section 
II, the system model is described and we formulate an op- 
timization problem that can be studied under the framework 
of MDP. Section III presents a spacial case of the formulated 
optimization problem as a constrained MDP to demonstrate 
the solving process of the general case. Next, we analyze 
the optimal policy of the constrained MDP in Section IV. In 
Section V, the numerical results are performed. Finally, Section 
VI concludes the paper. 

II. System model and problem formulation 

Time is divided into periods of length r each. The EVs 
arrive at the charging station according to a finite- state ergodic 
Markov chain {A[n]}. The EVs wait in a queue before 
charging as illustrated in Fig. [T] The charging station has M 
charge points, i.e., at most M EVs can be charged in each 
period. The charging station has renewable energy generation 
devices, and it can also gets power from the power grid. The 
renewable energy is modeled as another finite- state ergodic 
Markov process {£? a [n]}. The renewable energy is viewed 
as free, and the price for the grid power during the n-th 
period is denoted as P[n). The grid power price remains static 
during each period and changes between different periods. The 
sequence of the price, {P[n]}, is a finite-state ergodic Markov 
chain. We assume that the charged energy at one charging 
point during a period is constant, and is denoted as £ U In 
the n-th period, k[n) EVs from the head of the EV queue 
are allowed to charge. During the n-th period, the charging 
station allocates w [n] power from the storage battery, and the 
rest power will be supplied by the power grid. Assume that 
the required charging energy of the EV, E c , is independent on 

2 It is assumed that if an EV utilizes m charge points during a period, the 
amount of charged energy is mS. 
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each other, and E c = L£ with L being uniformly distributed 
in [l,2,...,C]0i.e., L ~ C]. 

Direct analysis of the EV queue length under the long term 
cost constraint is difficult due to the randomness of L. We 
propose the queue mapping method as shown in Fig. El Each 
EV in the EV queue corresponds to several consecutive charge 
demands (the number of the demands denotes the amount of 
required energy) in the charge demand queue The number 
of EVs at the beginning of the n-th period is Q[n] and the 
length of the charge demand queue is denoted as Q e [n]- We 
convert the average EV queue length minimization to the 
average charge demand queue minimization. Furthermore, we 
will prove that they are equivalent. 

The demand arrival can be given by X [n] = J2fl i' ^ ? 
where L % ~ W[1,...,C]. " 

Remark: As {A[n]} is a Markov chain, we can derive that 
{X[n}} is a D-BMAR 

In the n-th period, k [n] demands from the head of the 
charge demand queue are allowed to charge. During the n- 
th period, the charging station allocates w[n] power from the 
storage battery, and the rest power will be supplied by the 
power grid. Denote the number of charged demands in the n- 
th period as K [n] . The evolution of the charge demand queue 
length, Q e [n], is Q e [n + 1] = Q e [n] — K [n] + A [n]. Denote 
the capacity of the renewable energy storage battery as E max . 
The stored battery energy at the beginning of the n-th period 
is [n] . The battery energy evolution can be expressed as 

Eb[n + 1] = min {E b [n}-W[n}T + E a [n},E max } 

:= (E h [n]-W[n]r + E a [n])~. (1) 

The cost in the n-th period is C' [n] = {^^-W[n^ + P[n). 

Denote the state space as X and denote the action space 
as A . Let the (random) system state and action in the n-th 
period be X'[n] = (Q e [n),X[n),E h [n),E a [n),P[n)) e X' 
and [K [n], W[n]) G A , respectively. Define a policy 7r = 

3 C is a given constant. 

4 A demand means 8 energy (i.e., an energy block) need to be charged. In 
Fig. |3 the first EV (EV 1) in the EV queue wants to charge 3x8, then 
it corresponds to the first three consecutive charge demands in the charge 
demand queue. The second EV (EV 2) charges 2x8, then it corresponds to 
the two consecutive charge demands after the first EVs corresponding charge 
demands. 



(7T , 7r l5 • • • ) with 7r n generating an action (k [n], w [n]) with a 
probability ll2H . ll24l at the n-th period. We denote the set of 
all policies as II . Let x [n] = (q e [n],a [n],eb[n],e a [n],p[n]) 
be a (fixed) system state. The feasible (k [n],w[n]) in state 
x [n] belongs to JC (x [n]) = {0, 1, • • • , min{g e [n], M}} x 
W(x [n]) = {0, ±, • • • , ^}@The optimization problem that 
minimizes the mean charge demand queue length under the 
long term cost constraint, S, can be expressed as 



min Dl 



s.t. < 



1 

limsup — E 71 "/ 

n— )>oo n 



limsup — E 71 "/ 

n— 7>oo n 



n-l 

E 

2 = 



C[i] 



<B, 



K [i] < mm{Q e 



},M}, 



(2) 



(3a) 

(3b) 
(3c) 



with initial state x = (g e , a , e&, e a ,p). 

Since D-BMAP can be represented by a two-dimensional 
discrete-time Markov chain (DTMC) ll22l . the optimization 
problem in (|2]) can be analyzed in the framework of MDP. 
Moreover, the following lemma proves the equivalence of the 
mean energy demand queue length minimization and the mean 
EV queue length minimization. 

Lemma 1. The minimization of the mean charge demand 
queue length is equivalent to the minimization of the mean 
EV queue length. 

Proof: See Appendix lAl ■ 

III. Simplified problem 

For conciseness, we give a special case of problem ® in the 
section and investigate this relatively simplified problem in the 
following of the paper to show the solving process. General 
cases can be analyzed through similar solving process. 

When C — 1, we have Li = 1. Then the queue mapping is 
an identity transform and "EV" and "demand" are interchange- 
able. Thus, we can directly analyze the EV queue using the 
MDP framework. We have K[n] = K [n], A[n] = A [n] and 
Q[n] = Q e [n]. The queue length evolution is 

Q[n + 1] = Q[n] - K[n) + A[n). (4) 

The battery energy evolution is the same as (Q]>. The cost at 
the n-th period is given by 

'K[n]S 



:[n] = (- 



W[n\) P[n], 



(5) 



where (-) + := max{ •,()}. The system state becomes 
X[n] = (Q[n],A[n],Eb[n],E a [n],P[n\) with state space 
X and the action is (if[n], W[n]) with action space A. 
{X[n], (if [n], W[n])} is a controlled Markov process. Define 
a policy tt = (no , 7ri , • • • ) that 7r n generates an action 

5 The energy has been discretized. 
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(/c[n],K;[n]) with a probability at the beginning of the n- 
th period. We denote the set of all policies as II. The 
feasible (fc[n], w[n}) in state x[n] belongs to JC(x[n]) = 
{0,l,...,min{g[4M}}xWW = {0,1,...,^}. 
A stationary deterministic policy is tt = (g,g, •••), where 
# is a measurable mapping from X to /C(x[n]) x W(x[n]). 
Our objective is to find a policy that minimizes the mean 
queue delay under the long run constraint on the cost. The 
optimization problem (i.e., the constrained MDP) is given by 



mini) J := limsup — E£ 

71-en n^oo n 



Bl := limsup -E£ 

n— >-oo ^ 



s.t. < 



W[t] < 



E b \i] 



'n-1 




(6) 


_2=0 






"n-1 


< B, 


(7a) 


_i=0 










(7b) 






(7c) 



where x = (g, a, e&, e a ,p) G is the initial system state. 

Remark: (0) w special case of (0) wzY/z C = 1. C = 1 
means that EVs charge the same amounts of energy, £ (e.g., 
an EV production company). 

IV. Analysis of the optimal policy 

In this section, we perform theoretical study on the optimal 
policy. First, we prove that the constrained MDP can be 
analyzed through an unconstrained MDP. Then, we focus on 
the analysis of the unconstrained MDP. We analyze the uncon- 
strained MDP by using its corresponding discount MDP. Next, 
we consider the dimension reduction of the two-dimensional 
policy. Finally, we propose two stationary deterministic poli- 
cies based on the theoretical results. 



A. Transformation to the unconstrained MDP and discount 
MDP 

Define fp(x,k,w) := j3{^- — w) p - 
following unconstrained MDP (i.e., UP/3). 



q. We have the 



min Jp(x) := limsup — E£ 

n n— >-oo Tl 



^MX[i\,K[i\,W\j]) 



(8) 



Remark: UP 'p is an average cost MDP. Its optimal solution 
is referred to as the average cost optimal policy. 

The following lemma reveals that the constrained problem 
has the same solution as UP/3 with a certain /?. 

Lemma 2. There exists /3 > for which the optimal solution 
of the unconstrained MDP in (0 (i.e., UP/3) is also optimal 
for the constrained MDP in ©. 

Proof: See Appendix El ■ 
Next, we define a discount cost MDP with discount factor 
a corresponding to UP/3 for each initial system state x = 
(q, a, e&, e a ,p), with value function 



V a (x) minEJ 



i=0 



(9) 



The optimal solution for the discounted problem is called a 
discount optimal policy. 

The following lemma reveals the existence of the optimal 
stationary deterministic policy of UP/3, and furthermore, how 
to derive the average cost optimal policy. 

Lemma 3. There exists a stationary deterministic policy 
(k,w) that solves UP/3, which can be obtained as a limit of 
discount optimal policies as a —> 1. 

Proof: See Appendix O ■ 
Based on the above analysis, we find that the constrained 
MDP can be analyzed through the defined average cost MDP 
and its corresponding discount cost MDP thereafter. Hence, 
we first investigate the solution of the discount cost MDP in 
the following subsection. 

B. The discount optimal policy 

For state-action pair (x = (q,a,eb,e a ,p),(k,w)), let u = 
q — k and 77 = — wr. Then (u(x),rj(x)) can also define 
a stationary deterministic policy. Then, the discounted cost 
optimality equation ll23lL ll24l is given by 

V a (q,a,e b ,e a ,p) = min 

u G {0, 1, • • • , min{(7, M}} 
r\ G {0, 1, • • • ,e b } 



(q - u)£ e b -r] 



+ 



p + q 



+ aE ateatP [V a (u + A,A,{7i + E a )-,E a ,P)] J ,(10) 

and the corresponding value iteration algorithm (or successive 
approximation method) is 

V ai n(q,a,eb,ea,p) = min 

uE {0, ,min{g, M}} 

77 G {0, 1, • • ■ , e^} 

E a , 6a ,p [V^ n -i{u + A, A, (77 + E a )-,E a ,P)] J(ll) 

with V a ,o(q,a,e b ,e a ,p) = 0. 

First, regarding Vaiq^h^a^eb^e), we have the following 
properties (Property Q] - Property |3}. 

Property 1. V a (q, h, a, e&, e) is an increasing function of q. 
Proof: See Appendix iDl ■ 

Property 2. V a (q, p) is a non-increasing function of 

e b . 

Proof: See Appendix [0 ■ 
In practice, the allocated renewable energy will not surpass 
the required charging energy. Thus, k£ > wr, i.e., 

(q - u)£ e b -rj 



> 0. 



(12) 



r r 

Property 3. V a (q, p) is convex in (q,e b ). 

Proof: See Appendix IF] ■ 
Next, the following two lemmas reveal two necessary con- 
ditions for the optimality, respectively. 
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Lemma 4. In state x = (g, a, e&, e a ,p), (u(x),rj(x)) is not 
the discount optimal solution if u(x) > q — min{g, M} and 

rj(x) + e a > E max . 

Remark: Lemma |H reveals the sufficient condition for the 
non-optimality, and it can also be viewed as the necessary 
condition for the optimality. That is to say, any optimal 
solutions should not satisfy the condition. 

Lemma 5. Denote the discount optimal policy in state 

x = (q,a,eb,e a ,p) as (u*(x), rj*(x)). Then, (u*(x), rj*(x)) 
satisfies the following inequality arra>o 

£ 

Zi(u*,a,77*,e ,i?) < /3-p < Z 1 (u* + 1, a, 77*, e a ,p), (13) 
r 



Z 2 (u*,a,rf,e a ,p) < /? — < Z 2 {u,a,rf + l,e a ,p), (14) 
r 



Z 3 (u*,a,ri*,e a ,p) <p*(£-\) 

T 

< Z s (u* + l,a,ri* + l,e a ,p), (15) 



where 



with 



Zi(ii,a,77,e a ,p) 

= aE a , ea5P [Gi(^ + A,A(r/ + K)",K,P)] (16) 

Gi(q,a,e b ,e a ,p) = 

p)-V a (q-l p), (17) 

Z 2 (u,a,r\,e a ,p) 

= aE fl)ea)P [V a {u + A,A,(ri + E a )-, E a ,P) 

- V a (u + A,A,{ri-l + E a )-,E a ,Pj\, (18) 



and 



Z 3 (u,a,r],e a ,p) = 

a^a,e a ,p [V a (u + A, A, (j] + P) 

- ^(li-i+AAh-i+Kr,^^)]- (19) 

Proof: See Appendix iGl ■ 
Remark: Lemma \5\ gives the necessary condition of the 
discount optimality, i.e., the optimal policy (or policies) should 
be the solutionis) of the inequality array. Specially, if the 
inequality array has a single solution, the corresponding 
single solution is the optimal policy since the existence of the 
optimal policy. 

C. The average cost optimal policy 

First, Lemma H] still holds for the average cost MDR Next, 
based on Lemma [3] and Lemma we have the following 
lemma. 

6 Using Property [3] we can derive that Z\ (u, a, 77, e a , p) < 
Zi(u + l,a, 77, e a ,p), Z 2 (u,a, 77, e a ,p) < Z 2 (u, a, 77 + l,e a ,p), 
and Z 3 (u, a, 77, e a ,p) < Z 3 (u + 1, a, 77 + 1, e a ,p). 



Lemma 6. Given state x = (q, a, e&, e a ,p), the average cost 
optimal policy (u*(x), rj*(x)) should satisfy the following 
inequality array 

£ 

Zi(u*,a,7)*,e a ,p) < fi-p < Zi(u* + 1, a, 77*, e a ,p), (20) 
r 



Z 2 (u*,a,rj*,e a ,p) < f3 — < Z 2 (u,a,rj* + l,e a ,p), (21) 



Z 3 (7/*,a,77*,e a ,p) </*-(£- 1) 
r 

< Z 3 (tx* + l,a,r/* + l,e a ,p), 



(22) 



where 



and 



Zi(u,a,r),e a ,p) = lim Zi(u, a, 77, e a ,p), 

a:— >-l 

Z 2 (u,a,rj,e a ,p) = lim Z 2 (u,a,rj,e a ,p), 

a— >1 



Zs(u,a,rf,e a ,p) = lim Z 3 (u, a, 77, e a ,p). 

a— >-l 



Z). Reducing the policy's dimension 

The number of charging EVs fc and the power allocation 
from the battery u> are coupled together, they affect each 
other. However, if we assume that k has been chosen, then 
the required total power has been fixed. In this case, we will 
allocate as much power as possible from the battery to meet 
the required total power, i.e., the greedy policy for the battery 
power allocation. This is because the power from the battery 
is free (please refer to ©). We can guess that the greedy 
allocation strategy of battery power is the optimal policy. 
However, it is difficult to prove. The difficulty lies in the fact 
that the remaining battery energy will affect the future action 
and cost (e.g., (fTOl)). On the other hand, once w has been fixed, 
the power allocation from the power grid can also affect k. In 
summary, when k is chosen, the optimal w* is the greedy 
policy. By contrast, if w is fixed, the optimal k is not fixed, 
we need to solve the power allocation from the power grid 
to find the optimal k*. Thus, we can reduce the policy from 
(k,w) to k. We have the following conjecture. 

Conjecture 1. Let 71^ = (fc[0], fc[l], • • • ), and © can be 
converted to 



mmB% k := lim sup -E£ fc 



2=0 



(23) 



s.t. < 



n-1 



Bl* := lim sup -££*[£( 

n— >-oo 77/ L \ 



K\i}£ 



1 

mm - 

r 

[ K[i] < mm{Q[i],M}, 



.{K[7]f,^[i]}) + P[z]] <B, (24a) 

(24b) 

where the evolution of energy in the battery becomes 

E b [i + 1] = (E h [i\ - miii{K\i]£,E h \i}} + E a \i})- . (25) 

Remark: The policy can be reduced in dimension ((k,w) — > 
k). If the stated j3 in Lemma\2\ satisfying ft ^> 1, Conjecture 
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\T\ can be proved based on f ITfll ) in addition with Lemma \3\ and 
Lemma [2] 

In the following, we discuss the optimal policy after dimen- 
sion reduction. For state-action pair (x = (g, a, e&, e a ,p), k), 
let u = q — k, and u(x) can also define a stationary 
deterministic policy. We have the following lemmas to reveal 
the properties of the optimal policy. 

Lemma 7. Denote the discount optimal policy in state x = 
(g, a, e&, e a ,p) as Then, satisfies 



Z(^*)</3-p<Z(^* + l), 
r 



(26) 



where 



F a (tx + A, A, (r](u) +E a )-,E a , P) 



E. Two stationary deterministic policies 

Based on all above theoretical analysis, we propose the 
following two specific stationary deterministic policies. For 
state x = (q^a^eb^ea^p), we define the radical policy as 
(k = min{<2, M}, w = min{ y ,fe£:} ) . That is to say, we charge 
as many EVs as possible, and use the greedy policy for the 
battery energy allocation, i.e., if the required energy is not 
greater than the battery energy, then all the energy will be 
supplied from the storage battery and no grid power will be 
used. Otherwise, all the storage battery energy is allocated, 
and the rest will be supplied from the power grid. 

In the radical policy, the average cost constraint is not 
considered. Then we propose another policy (i.e., the con- 
servative policy) that guarantees the average cost constraint 
through satisfying the cost constraints in each period. We call 



^-l + i,i,(^-l) + E a ) ,E a ,P)\ the policy (k = mm{q, M, ^^}, w 



min{e{,,/c£} 
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rj(u) — rj(u — 1) 



(27) 



with 



rj(u) := max{0, — (q — u)£}. (28) 
Furthermore, the average cost optimal policy u* satisfies 



Z(u*) <fi-p<Z{u* + l) 



with 



lim Z(u). 



Z(u) 

Proof: See Appendix [HI 
Lemma 8. For x = (q^a^e^^ea^p) satisfying 

£ 

Z{q - min{<2, M }) > 



(29) 



(30) 



(31) 



u = q — min{g, M} is the discount optimal policy. In addition, 

for (q,a,eb,e a ,p) satisfying 



Z(q) < p-p, 



(32) 



u = q is the discount optimal policy. 

Proof: See Appendix U ■ 

Remark: u = q — min{g, M}, i.e., k = min{g, M } means 
charging as many EVs as possible. If the number of EVs in the 
queue is less than the charge point number M, charge all the 
EVs. Otherwise, charge M EVs from the head of the queue, 
u = q, i.e., k = denotes charging no EV. 

Based on Lemma [8] and Lemma [3 we have 



Lemma 9. For x = (q, a, e&, e a ,p) satisfying 

£ 

Z(q - min{<2, M }) > /3-p, 



(33) 



u = q — min{g, M} is the average cost optimal policy. In 
addition, for (g, a, e&, e a: p) satisfying 



Z(q) < P-P, 

T 

q is the average cost optimal policy. 



(34) 



the 

conservative policy. That is to say, we first guarantee that the 
cost of charging in each period is less than the average cost 
constraint, then charge as many EVs as possible and utilize 
the greedy policy for the battery energy allocation. 

In the whole paper, we assume that the power from power 
grid and renewable energy generator is sufficient to stabilize 
the queue length. The stability issue such as the bounds on 
average generation rate of renewable energy or average EV 
arrival rate will be studied in future work. 

V. Numerical results 

In this section, we perform simulations to demonstrate the 
relations among the mean EV arrival, mean renewable energy 
arrival, upper bound of the average cost, average cost, and 
average EV queue length. Meanwhile, we consider different 
charge point numbers and capacities of the renewable energy 
storage battery. In the simulations, the period length is r = 1, 
and the size of the "energy block" is £ = 10. 

Fig. [3] shows the average cost performance with respect to 
the mean EV arrival, A. In the simulations, we utilize the 
radical policy. We consider the i.i.d. cases of A, E a , and P. 
A takes and 2 A with equal probability 0.5. E a takes values 
{0,50, 100} with probabilities {0.1,0.4,0.5}. P takes values 
{5, 10, 20} with probabilities {0.2, 0.3, 0.5}. The performance 
is averaged over 10 5 periods. We set the number of charge 
points M = 50 and M = 8 in Fig. [3(a)] and Fig. [3(b)! 
respectively. Furthermore, we plot the curves for different 
storage battery capacities: E max — 100, E max — 300 and 
infinite capacity, respectively. 

In Fig. |3(a)[ we can see that when A is small, the cost 
is nearly zero. However, when A is large (e.g., A > 10), 
the cost increases rapidly with increase of A according to 
roughly a linear function. It is because when A is small, 
the required energy is small and the battery can supply the 
energy. Thus, no grid power will be consumed and the cost 
is zero. Once A is larger than a certain value, the required 
energy is larger than the battery energy, then the grid power 
will be utilized. As M is large (compared to the considered 
A), i.e., the restriction on the number of charge points will not 
influence the performance, we have k = min{g, M} = q with 
a high probability. The grid power consumption will increase 
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with increase of A. Moreover, when A is large, the grid power 
becomes the main energy source. Based on ©, we derive that 
the cost varies with A roughly according to a linear relation. 

From Fig. |3(b)[ we can find that the average cost is zero 
when A is small, and with increase of A, the average cost 
increases. But once A is larger than a certain value, the average 
cost remains constant. It can be explained as follows: when A 
is small, the required energy can be supplied by the battery 
with a very high probability and no grid power is needed. 
Then the average cost is zero. When A increases, the required 
energy increases. Once the battery energy is not enough, the 
grid power will be consumed to fulfill the gap between the 
required energy and battery energy. With increase of A, the 
grid power consumption increases since the average battery 
energy is constant. Thus, the average cost increases. However, 
when A is large enough, we get k = min{<2, M } = M with 
a high probability because M is not large in this simulations. 
Then, the required energy k x £ = M x £, i.e., it becomes a 
constant. That means the grid power consumption is a constant 
also. Thus, the cost remains static. 

Fig. [4] depicts the average cost performance with respect 
to the mean renewable energy arrival, E a . The radical policy 
is applied in the simulations. A takes values and 10 with 
equal probability 0.5. E a take values {0, jE a , -yE a } with 
probabilities {0.1,0.4,0.5}, respectively. P is the same as in 
Fig. [3] and M = 50. E max = 100, E max = 300 and infinite 
capacity are also respectively considered in the simulations. 
From the figure, we can find that the cost decreases with 
increase of E a . But once E a is large enough, the cost almost 
remains static. First, in the range of small E a , when E a 
increases, more free renewable energy will arrive and be stored 
in the battery. And then, the cost will decrease. If the battery 
capacity is large enough, all the arrived renewable energy can 
be stored in the battery. With the increase of E a , the battery 
energy will increase all the time. Once the battery energy is 
larger than the required energy for charging, no grid power 
is needed then, and the cost becomes zero since that time. 
If the battery capacity is not large (e.g., E max = 100 in the 
figure), the overflow occurs when E a is large. That is to say, 
the battery energy will remain E max even though we increase 
E a . On the other hand, E max is smaller than the required 
charge energy, so grid power is still needed. Consequently, 
the cost is non-zero and remains static. 

From Fig. [3] and Fig. HI we can observe that the larger 
the battery capacity, the lower the cost. That is because when 
E max is larger, the probability of overflow will be lower (it is 
zero for infinite capacity). Then, less free renewable energy is 
wasted and the cost will be lower. Furthermore, we can derive 
that if A is less a certain value or E a is larger than a certain 
value, the average cost can be less than a certain value, Then 
we claim that when A is less a certain value or E a is larger 
than a certain value, the radical policy is also optimal even 
when considering the constraint^ 

Fig. [5] illustrates the average EV queue length performance 
with respect to the upper bounds of the average cost when the 

7 Notice that the radical policy is optimal for the mean EV queue delay 
minimization without the average cost constraint. 
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conservative policy is applied. In the simulations, A chooses 
values {0, 12} with equal probability 0.5. E a and P have 
the same settings as in Fig. [3] In the plotting, we consider 
different values of the battery capacity and charge point 
number. We can observe that the average length performance 
improves with increase of B, and when B is larger than a 
certain value, the average length performance become almost 
constant. The reason is as follows: when B is small, k = 

min{g, M. — -^—} — min{g, — -^—} with a high probability 
and it increases with increase of B. Thus, the average EV 
queue length performance increases. Once B is large enough, 
we get k = min{g, M}, and the average length remains static 
with respect to B. Additionally, by comparing the four curves, 
we can derive that the larger the capacity or the charge point 
number, the better the length performance. 
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VI. Conclusion 

We consider the scheduling of the EVs' charging at a 
charging station whose energy is provided from both the power 
grid and local renewable energy. Under the uncertainty of the 
EV arrival, the renewable energy, the grid power price, and the 
charging energy of each EV, we study the mean delay optimal 
scheduling with the average cost constraint. We analyze the 
optimal policy of the formulated MDP problem. In addition, 
two specific stationary policies (radical policy and conservative 
policy) are applied in the simulations to reveal the impacts of 
relevant parameters on the performance. 

Appendix A 
Proof of LemmaQ] 

First, the energy demand queue length and the EV queue 
length have the following relation. Q e [n] = J2?=i Li with Li 
being irrelevant to the queue state. Thus the average energy 



Meanwhile, if an EV comes earlier than another EV, it will 
leave earlier in the EV queue serving. Using the queue 
mapping mechanism, the earlier arrived EV will leave no later 
also in the energy demand queue serving @ That is to say, 
the queue mapping is an isotonic mapping. Then, we claim 
that a policy minimizing the mean EV queue length results in 
minimal mean demand queue length, and vice versa. 

Appendix B 
Proof of Lemma[2] 

The proof is based on the results of [25 1 . We prove that for 
some /3, the optimal policy 7r* of the unconstrained MDP (8]) 
(i.e., UP/3) satisfies 1) 7r* yields and D n as limits for 
all x G X\ 2) B 71 = B. Observe that limsup and liminf are 
equal for each f3 > (since the controlled chain is ergodic 
and the policy is stationary L24J). 

Appendix C 
Proof of Lemma[3] 

First, we derive that the conditions of Proposition 2.1 in fl26l 
are satisfied. Then a discount optimal stationary policy exists. 
Next, we prove that for some xq, V a (x) — V a (xq) < 00. Third, 
there exits a policy tt G A and an initial state x G X such 
that Jp < 00 in the practical problem. Otherwise, the cost is 
infinite for all policies and any policy is optimal. Accordingly, 
we can prove the lemma by applying Theorem 3.8 in [26|. 

Appendix D 
Proof of Property [T] 

We verify the increasing property by induction. According 

4- JTTTl ir n A T/ p((q-min{q,M})£-e b ) p _ 

to (1111) . V a ,o = and 14,1 = — \-q. The 

increasing property in q holds. Assume V a ,n-i(q, a, 65, e a ,p) 
is increasing in q. Depending on the values of M, we have 
the following two cases. 

Case 1: M > q + 1. Fix (a, e&, e a ,p), in the state (q + 
1, a, e&, e a ,p), the set of feasible u is {0, 1, • • • , #+1} whereas 
it is {0, 1, • • • , q] for state (q, a, e b , e a ,p). Consider state (q + 
1, a, eft, e a ,p), let the optimal action be (i/*,/?*) with G 
{0, 1, • • • , q}, hence 



V a , n (q + l,a,e b ,e a ,p) = 

'(q + l-u*)E e b -rj*\ + 



p+fe + l) + «x 



E a , ea ,p [ViK + AA (ri*+E a )-,E a ,P)] .(35) 
As (u*,rj*) is feasible in state (g, a, e&, e a ,p), 

V a , n (q,a,e b ,e a ,p) < — J p + q 

+ aE a , ea , p [V^n-xiu* + A, A, (77* +E a )-,E a ,P)] 

p). (36) 

If (it*,?/*) with u* = q+ 1, 

V a>n (q + l,a,e b ,e a ,p) =q+l 

+ aE a> e a)P [Vils + l + AAfl'+K)"^.^)] • 

(37) 



demand queue length is \ YTj=\ Qe[j] = \ YTj=\ Ya=i H- 8 Leave at the same time is possible. 
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Meanwhile, since (q,rj*) is feasible in state (g, a, e&, e a ,p), 

+ «E a , ea , p [T4,n-l(g + i,i,(?]*+E a )-,K^)] 



(a) 

< V a ,n(q + l,a,e 6 ,e a ,p), 



(38) 



where (a) holds since the induction hypothesis. 

Case 2: M < q. The set of feasible u is {0, 1, • • • , M} in 
both the state (g+1, a, e&, e a ,p) and state (g, a, e&, e a ,p). Then 
we can prove the increasing property of V^ jn (g, a, e^, e a ,p) by 
using (l35t and (l36l) . 



Appendix E 
Proof of Property [2] 

Based on (fTTt . the property can be proved through 
induction. First, we have = and V a ,i = 

(3((q-min{q,M})£-e b ) + p 

— - h q. Thus the non-mcreasmg property 

in e\) holds for n = 0, 1. Next, assume V^ n _i(g, a, e^, e a ,p) 
is a non-increasing function of e&. Fix (g, a, e a ,p), for state 
(g, a, e&, e a ,p), let (?/*, 7/*) be the optimal policy. We get 



a,n(g,«,e6,e a ,p) = 



(q-u*)£ e h -r]*\ + 



) 



g + aE a , ea , p [y ajn _!(tx* + A,A, (r/* + ^ a )-,K,^)] • 

(39) 

Since (?/*, 7/*) is feasible in state (g, a, e& + l, e a ,p), we derive 

'(<7-u*)£ e 6 + l-r/*\ + 



V ain (q,a,e b + 1, p) </3(- 



-) P+ 



(40) 

Combing ([39l> and (@0|>, we get 

p) < V^, n (g,a,e fe + l,e a ,p). 
Then we complete the proof of the property. 



Appendix F 
Proof of Property [3] 

First, we prove the following proposition. 

Proposition 1. For e (0,1) and \/xi,X2,y, we have 
0min{xi,?/} + (i-0) mm{x 2 ,y} < mm{0£i + (i-0)£ 2 , ?/}. 

Proof: The proposition can be verified by considering 
min{xi,X2} > y, max{xi,X2} < y, and min{xi,X2} <y < 
max{xi, X2}, respectively. ■ 
The convexity is proved by induction. For n = 0, V^o = 
and is convex. Assume V a ^ n -\{q, h, a, e^, e) is convex in 
(<2,e 6 ). Fix (<2,a,e 6 ,e a ,p), let (2x1,771) and (^2,^2) be the 



optimal policy for (qi^e^i) and (#2,652). Then, we get 
, (2, 651 , 6 a , p) + (1 - 0)14,n(<22, a, e 62 , e a , p) 



V r r 



(1 



(#2 - u 2 )£ e62 - ^2 



P + <?2 



r r 
+ aE a , ea)P 0y a?n _i(^i + A, A, (771 +E a )-,E a ,P) 

+ (1 - 0)^,n-l (^2 + A, A, (772 + E a )~ ,E a , P) 



> P 



(« 



11 ~U! 



+ (l-0)(e 62 -^))]^ + [0gi 



(1 - 0)<?2 



V^, n -l(^l + (1 - 0)^2 

A, A, 0(r/! + £7 a )" + (1 " <t>){m + K)" , P) 

(^{qx - m) + (1 - 0)(<? 2 - ^2))^ - (<Ke&i - 771) 
+ (l-0)(e b2 -r? 2 ))]^ 



to 

> /5 



+ [0#i + (1 - 0)<? 2 ] + aE a ,e a ,p 



V n 



i((jmi 



+ (1 - 4>)u 2 + A, A, (077i + (1 - 0)772 + E a )~, E a , P) 
(d) 

> V^niHi + (1 - 0)<?2,a,(/>e 6 i + (1 - (j))e h2 ,e a ,p), 

where (b) holds because of the convexity of 
VoL,n-i{q,h,a,eb,e), (c) holds because of Proposition 
1 as well as Property 12 and (d) holds since 
(cj)Ui + (1 — 0)^2,0^71 + (1 — 0)772) is feasible for 

0(gi,a, e&i,e a ,p) + (1 - 0)(<?2, a, e 62 , e a ,p). 

Appendix G 
Proof of Lemma[5] 



Let 



(q-u)£ e b -r]^ 



p + q 



+ aE aA)P + A,A(^ + K)",K,P)] • (41) 

First, we have 

5(^+l,r7)-5Kr7) = -/3-p 

r 

+ aE a , ea >p [14 (u + l + A,A,(ri + E a )-,E a ,P) 

- V a (u + A,A,( V + E a )-,E a ,P)] (42) 

and 

g 

S(u - 1,77) - S(u,rj) = P-p 

T 

+ aE a>eajJ , [V a (u - 1 + A, A, (t] + E a )~ ,E a ,P) 

- V a (u + A,A,(ri + E a )-,E a ,P)}. (43) 

Then applying S(u* + \,rf) - S{u*,rj*) > and S(u* - 
I,??*) - S(u*,r]*) > 0, we obtain (O. Similarly, as 

S(u,rj+l)-S(u, V ) = f3^ 

T 

+ aE a , ea , p [V a (u + A, A, (r? + 1 + E a )~,E a , P) 

- V a (u + A,A,( V + E a )-,E a ,P)] (44) 
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and 

S(u, V -l)-S(u,r 1 )=l3^- 

T 

+ aE atea}P [V a {u + A,A,{ri-l + E a )-,E a ,P) 
- V a (u + A,A,(rj + E a )-,E a ,P)], (45) 

we can reach ([14]> from S(u*,r)* + 1) - S(u*,r]*) > and 
S(u*, ??* - 1) - S(u*, ri*)>0. In addition, 

S(u + 1,77+ 1) - S(u, rj) =p-(l- S) 

T 

+ aE a , eojJ , [V a (u + 1 + A, A, (rj + 1 + E a )~,E a , P) 

- V a (u + A,A,( V + E a )-,E a ,P)] (46) 

and 

S(u-l,f/-l)-S(u,»7)=^(£-l) 

+ * )ea)P [^(77 - 1 + A, A, (77 - 1 + E a )~, E a , P) 

- ^ a (tx + A,A,(r7 + S a )- E a ,P)]. (47) 

Then, (TT3T) can be obtained by applying S(u* — 1, 77* — 1) — 
S(u*,rj*) > and #(77* + 1, 77* + 1) - 77*) > 0. 

Appendix H 
Proof of Lemma[7] 

First, based on Conjecture [T] we only need to consider 
the policy set {(77,77) : (77,77 = rj(u)) D (77,77) > (0,0)}. 
Consequently, 

S(u, 77(77)) = P{ ^ Jp + q 

+ «E a , ea , p [y a (^ + A,A,(77(^)+E a )-,K,P)] • 

(48) 

Then applying S(u* + 1, 77(1/* + 1)) - £(77*, 77(7/*)) > and 
5(7/* - 1,77(77* - 1)) - ^(tx*, 77(77*)) > 0, we get Z(t7*) < 
/3f P < Z(u* + 1). Next, using Lemma[3l we reach the second 
half of the lemma. 

Appendix I 
Proof of Lemma[8] 

Following the proof of Lemma [71 we can prove the first 
half of the lemma by contradiction. Specifically, suppose 
77 = q — min{g, M} is not the optimal solution, then 
5(77* -1, 77(77* -1)) -S(u*, 77(77*)) > should hold. We have 
Z(q — min{g,M}) < Z{u*) < (3^p and the contradiction 
occurs. We can verify the second half of the lemma similarly 
by using contradiction. Assume 77 = q is not the optimal 
solution, then £(77* + 1, 77(77* + 1)) - £(77*, 77(77*)) > should 
be satisfied. Consequently, we get Z(q) > Z(u* + 1) > /3~p. 
The contradiction occurs then. 
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